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Abstract: This paper derives the nonparametric maximum likelihood estima- 
tor (NPMLE) of a distribution function from observations which are subject to 
both bias and censoring. The NPMLE is obtained by a simple EM algorithm 
which is an extension of the algorithm suggested by Vardi (Biometrika, 1989) 
for size biased data. Application of the algorithm to many models is discussed 
and a simulation study compares the estimator's performance to that of the 
product-limit estimator (PLE). An example demonstrates the utility of the 
NPMLE to data where the PLE is inappropriate. 



1. Introduction 

In this paper, the EM algorithm 4] of Vardi 18] is extended from size-biased to 
VF-biased observations, where W is a known, positive, increasing and right con- 
tinuous function. Specifically, the algorithm finds the distribution function G that 
maximizes 

4=1 M S=i P 

where [i* = J °° W(x)dG(x), G — 1 — G and x%,. ..,x m and yi, . . . , y n are given 
data points. 

The function (jl.ip is proportional to likelihoods that arise in reliability and sur- 
vival studies when data are subject to both bias and censoring. Several authors 
derive the nonparametric maximum likelihood estimator (NPMLE) of G in prob- 
lems where the likelihood is a special case of (11.11) . The size-biased case, W(x) = x, 
appears in cross-sectional sampling if the population is in steady state. An early 
work is Cox [3| who estimates G in the uncensored case (i.e., n = 0). Vardi [18( 
presents four problems that result in likelihood proportional to (|1.1[) with W(x) = x 
and develops a simple EM algorithm to find the NPMLE of G. In an earlier paper 
[l6| , he develops an EM algorithm to estimate the underlying lifetime distribution 
of a renewal process observed in a time window (see also [HI). Wijers suggests 
the EM algorithm for the same window sampling in a population model under 
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the steady state assumptions. When the renewals or birth times are known for all 
sampled individuals, the likelihood is proportional to (ll.ip with W(x) = x + C 
where C is the window width. Motivated by data on HIV infection, Kalbfleisch 
and Lawless Q consider a population model in which entrances are according to 
an inhomogeneous Poisson process. Their likelihood is similar to the uncensored 
part of (jl.ip with W being the cumulative rate of the process. They briefly remark 
on the censored case at the end of their pap er. The random left truncation model 
is commonly used for survival data [12l l2ll . l23j . In that model, two independent 
variables A ~ W and X ~ G are truncated to the region A < X. The variable X 
may be censored by A + C, where C is independent of (^4, X). When the truncation 
distribution is known, the likelihood of the model is proportional to (jl.ip . More 
models that give rise to likelihoods proportional to (|l.ip are reviewed in Section [3l 

In Section [2| we suggest a unified EM algorithm that provides the NPMLE for 
the general likelihood and discuss its convergence properties. Two methods 

for developing the EM algorithm arc considered; the first uses an extension of 
Vardi's multiplicative censoring model and the second utilizes the random left 
truncation model. By presenting the two approaches, the similarity between the two 
seemingly unrelated models is highlighted. In Section [3] we derive the likelihoods 
of the above mentioned papers and other models and show that they are special 
cases of In Section |4] we compare the performances of estimators with full 

and no knowledge on W by simulation. The algorithm is then used to reanalyze the 
Channing House data @ . We complete the paper with discussion in Section [5j 

Throughout the paper, we will refer to and to similar expressions as "like- 
lihood" , although they may be only proportional to the likelihood of the data. We 
will refer to the maximizer of (j 1 . 1 [) as the NPMLE of G. To distinguish, we will call 
the maximizer of G when W is not known the product-limit estimator (PLE). The 
latter is presented in Section 13.41 and is used in the simulation and application for 
comparison purposes. 

2. An EM algorithm 

There are several ways to develop the EM algorithm. We first adopt the approach 
used in 1 1 SI] and solve a seemingly unrelated multiplicative censoring problem (simi- 
lar to [18[ problem A). This is shown to be almost equivalent to our original problem 
of maximizing (jl.ip . The second approach is motivated by cross-sectional samp ling 
where subjects are selected to the sample at a random time point (similar to [l8( 
problem B). It is assumed that ages at sampling are observed for all sampled indi- 
viduals and residual lifetimes are subject to random censoring. Here the censoring 
times are independent, and are independent of the ages at sampling and the residual 
lifetimes. Both approaches eliminate the bias of the data and deal only with cen- 
soring. They demonstrate that bias is a secondary problem for estimation relative 
to censoring (when the bias is known and does not cause identification problems). 
After developing the algorithm, several convergence properties are discussed. 

2.1. Vardi's approach 

Since W(x) > and known, (jl.ip is proportional to 



(2.1) 
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where G w (x) — W{u)dG{u)/ fi* is the W- weighted version of G. The problem 
can be divided into two parts; maximization of (|2.ip for G , and transformation 
using G(dx) oc G w (dx)/W(x). To maximize (|2.1[) . consider first the following mul- 
tiplicative censoring model which extends the problem studied by Vardi [l8| . 

Let X®, . . . , X 1 ^, Z®, . . . , Z® be positive random variables from the distribution 
G° supported on (0, oo) and let Ui, . . . ,U n ~ U(0, 1) where all the random vari- 
ables are independent. Let Y® = W(Zf)Ui (i = 1, . . . ,n). Similar to Vardi who 
solves the problem for the special case W(x) — x, the Y®'s describe the transforma- 
tion from the 'complete data' (X®, . . . , X 1 ^, Z\ 1 . . . , Z®) to the observed 'incomplete 
data'. This transformation is used in the E-step of the EM algorithm. The statis- 
tical problem is of estimating G° using (x®, . . . , x^, y(i ■ ■ ■ , J/„) 5 a realization of 

(vO vO vO v a \ 

, . . . , y\ m7 I I , , . . , I n ), 

First note that the density of Y° with respect to Lebesgue measure is 
fyo(t)= [ -dF w{ZO) (v)= [ WM dG °( v ) 

Jv>t V J{v:W(v)>t} W{V) 

where fy and Fy denote the density and distribution of a random variable V. The 
likelihood of the data (x°, y°) = {x\, . . . , x^, Vi, ■ ■ ■ , Vn) 1S given by: 



C(G°; X ,y ) = l[dG (x°)ll / -^G c 



(»)■ 



By defining W (y) = min{u : W(v) > y} (which exists from the right continuity 
assumption) and recalling that W is increasing, we can rewrite the likelihood as 

(2.2) C(G°; X ,y°)=l[dG (x.)l[ / — — -rfG°(w) 

and the similarity to our original likelihood (|2.1D is apparent. 

When W is strictly increasing, no information is lost by the transformation X a i— > 
W(X°), and Vardi's EM algorithm [18] can be applied to W(Xf ),..., W(X^), 
Y] , . . . , Y°. This gives the NPMLE of F w{x °) and the NPMLE of G° is obtained 
by a simple transformation. Specifically, let t\ < ti < ■ ■ ■ < th be the distinct values 
of x?, ... , x° m , W^iyf), and let be the multiplicity of the X° 

and W^ 1 {Y Q ) samples at tf 

rn n 

& = E 7 ^ = *i>> c- = E^w- i (»?) - 

i=l i=l 

Denote by the mass G° assigns to tj, p = (pi, . . . ,Ph)\ then the problem can be 
rewritten as 



maximize £(p) = J] ( £ ^yPfc ] 
(2.3) subject to 2_,Pj = 1 

3 

Pj >0(j = l,...,h). 
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And an EM step is 



(2.4) pf w = (n + m)- 1 ( & + [Wfo)]- 1 ??" ^ 



Y.i> k W{ti)]- 1 pf d 



where p° ld and p™ 6 ™ are the current and updated estimates of pj . 

The derivation of (|2.4p from (|2.3p holds true whether or not W is strictly increas- 
ing. Maximization of (I2.2[) reduces to the discrete problem (|2.3p by showing that 
the support of the NPMLE is discrete and determining ti,.. This somewhat 
technical point is deferred to the Appendix for the case of a general non-decreasing 
W. 

Finally, the connection between G° and G w is apparent from (|2.1[) and (|2.2[) . 
where the only difference appears in the left limits of the integrals. These were used 
only to determine the points t\,...,th and their multiplicity in the two samples. 
Thus, to use the algorithm for maximizing (|2.ip . one needs to define t%, . . . ,th as 
the distinct values of Xx, ■ ■ ■ , x m , y\,...,y n (the original data) and to change £j and 
Q accordingly. The support of our original problem is a subset of the observations 
as discussed in the Appendix. After finding it, the problem reduces to (12.3[) for 
which the algorithm (|2.4[) derives the NPMLE of G w . The corresponding estimate 
of G is achieved by the inversion formula dG{x) oc [W(x)]~ 1 dG w (x), as mentioned 
above. 

The Kaplan-Meier estimate of G [8| is obtained by using W = 1 in (|2.4p (see 
Section I5TT1 for more details). This estimator does not use the correct weights when 
it redistributes the mass of the censored observations, and in general it is inappro- 
priate. 



2.2. A direct approach 

Cross-sectional samples of lifetimes usually contain the age at sampling A and the 
residual lifetime R; the latter is subject to random censoring. Vardi's approach 
uses the sufficient statistic A + R [see (|2.7p below] to develop an EM algorithm. 
In this subsection, the statistic (A, R) is used. The two approaches give different 
perspectives about the formation of the bias and censoring. 

Assume that W(0) = 0, W{t)G{t) -> as t -> oo, and J °° W{dt)G(dt) = 0. 
Mathematically this means that [i* = J °° W(t)G(dt) = J °° G{t)W(dt). Practically 
it means that the probability of leaving the population at the very instant of sam- 
pling is zero; hence, one does not need to worry about inclusion or exclusion of such 
observations in the sample. 
Let 

(2.5) f A (a) = ^-dW(a), a > 0, 



(2.6) Wr|a) = ^fe±^, r>0 



so the joint density at (a, r) is 



/■'■ 

G(a) 



(2.7) f A<R (a, r) = dG{a + r) dW(a), a > 0, r > 0. 



Now suppose we have one sample of m pairs (ojj, r,) from /a,r and another indepen- 
dent sample of n variables yj from f A . This describes the so-called incomplete data 
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and by denoting xi = a, + r, (i = 1, . . . , to) we arrive at the likelihood (jl.ip . Since 
VF is known, the product Yii <2W(o») dW(yj) is irrelevant for maximization. 

The complete data are of course Xi, . . . , x m and y% + f%, . . . , y n + r n , where fj 
is the unobserved residual lifetime of subject j of the second sample. Using the 
sum Xi = a,i + Ti instead of its components is justified by noticing that the sum 
is the sufficient statistic for the complete data problem. By variables changing and 
integrating a out in (|2.7p , it can be easily verified that the likelihood of X4 (or the 
density of y 3 + fj ) is 

(2 8) 11 ;.r, ;■</(,';.)■, j 

For the support points tx, ■ ■ ■ , th described in the appendix, the E-step uses (|2.6[) : 

(2.9) E nold (I{A t + Ri = t 3 }\A t = y t ) = \ 7fe > Vl }, 

where U° ld = « d , . . . , 7r,f d ) is the current estimate of the unbiased distribution 
G, i.e., the estimate at tj of the weighted distribution G w is p° ld cx W(tj)-K° ld . 

The complete likelihood is a product of terms such as (|2.8p which is the likelihood 
of a weighted sample. An M-step estimates the weighted distribution G w by the 
empirical distribution function. Combining the E-step and the M-step, an iteration 
is given by: 



(2.10) Pr° = ( n + OT )~ 1 Kj + 



where and Q are the multiplicities of uncensored and censored observations, 
respectively. Put nf d = [W(t j )]- 1 pf d / Y.kiW^kYi^Pk d } in the equation above 
to get (pO) . 



2.3. Convergence of the algorithm 

Several properties of the problem and the algorithms are sketched below. The ap- 
pendix shows that for some functions W, such as step functions, the NPMLE is not 
unique and different choices of the support points yield the same (maximum) value 
of the likelihood. The properties below hold for a given choice of support points. 

Property 2.1. Given the points t\ 1 . . . ,th, the maximizer of \2.3\) is unique. 

Proof. We show that the problem can be replaced with a maximization of a strictly 
concave function over a convex region. Following [l8j ]. write qj — Pj/W(tj) , Qj — 
J2k>j Ik- Since W(tj) is constant in the likelihood, we can replace (|2.3j) with: 

h 

maximize log ( JJ^ q^ J Q^ 1 ) 

(2.11) subject to ^qjWitj) <1 

j 

qj >0(j=l,...,h). 
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Now log ( n i= i QjQf) = Ej=i lo gfe) + Ej=i log(Qj), and the assertion fol- 
lows from log of a partial sum being a strictly concave function from M. +h to M, 
and since sum of concave functions is concave (recall that for all j, £j, Q > and 

+ >i). ' ' ' □ 

Property 2.2. Let p n and C n = C(p n ) be the value o/p and the value of the likeli- 
hood assigned by the EM algorithm \2.J$ after its n 'th iteration; then C n converges 
to a point C* . 

Proof. [l| and later [4| show that the likelihood increases in each iteration of the 
EM algorithm. The assertion follows from (|2.3|) that shows that the likelihood is 
bounded from above. □ 

Property 2.3. If the maximizer of \2.3]) assigns positive mass to all points t%, . . . , 
th, then the algorithm \2.1$ converges to that maximizer. 

Proof. This follows from Theorem 4 of [24j by using the uniqueness proved in 
Property 12. II and the fact that £ is a polynomial on the simplex (which establishes 
the regularity conditions). □ 



3. Examples 

Many examples are easily described using the Lexis diagram (see Figure [T|) that 
depicts changes in a population S over time. The horizontal and vertical axes of 
the diagram represent the calendar time and the lifetime or duration of subjects 
in S, respectively. Subjects of S are represented by 45° lines that start at their 
time of entering S and end at leaving. Lines that cross the vertical line x = t 
correspond to the population of iS at t. In particular, a cross-sectional sample at 
time contain those subjects (or lines) that intersect with the line x = 0. For a 
review of the Lexis diagram and its utility for studying different sampling designs 
and population quantities see and [ill ]. 



3.1. Random censorship 

A common sampling plan is of collecting data on subjects entering S during the 
time window [0, C]. This widely used design is known as the random censorship 
model and it is depicted in the top left panel of Figure [TJ According to the model, 
a random sample Tj (i = 1, 2, . . . , m + n) is selected from a distribution G, but 
instead of the Tj's, observations are independent realizations of min(T i ,Ci), where 
Ci ~ Fq is independent of T^. The data contain also the information whether the 
observations were censored or not. 

Denote the uncensored observation by x i, ... , x rn and the censored ones by y\, . . . , 
y„ , then the likelihood of the data is 

m n m n 

(3.1) HdGix^llGiy,) xJIWlI^y- 

i—l J — 1 i—1 j — 1 

The NPMLE is the celebrated Kaplan-Meier estimator Q and is based only on the 
left part of (|3.1|) . This is a special case of (jl.ip with W = 1, and can be solved 
(in somewhat a redundant way) by the EM algorithm (12. 4[) . At convergence, the 
algorithm is exactly the self-consistent estimate of Efron [5( and (|2.4[) illustrates the 
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Fig 1 . Sampling designs in the Lexis diagram. Each 45° line represents a subject in the population. 

The horizontal axis is the calendar time and the vertical axis is duration or lifetime. Observed 

data are depicted as solid lines. 

Top left - random censoring model. I = (0, C) 

Top right - follow-up on cross-sectional sampling. I = (— oo,0). 

Bottom left - window sampling. T = (— oo,C). 

Bottom right - cross-sectional with truncation. I = (—/3, -—a). 



redistributed to the right principle. As Efron shows, it reduces to the non-iterative 
Kaplan-Meier estimator. 

According to Lemma IA.31 an NPMLE assigns mass only to the uncensored obser- 
vations, because W is constant everywhere (if the last observation is censored, then 
it assigns mass also to max{?/j}). This is a well-known feature of the Kaplan-Meier 
estimator. 



3.2. Population models with poisson entrance process 



The likelihood (jl.ip is obtained in many designs where entrances to the studied pop- 
ulation <S are according to a Poisson process. Specifically, the model assumes a Pois- 
son entrance process N(t) on (— oo, C) with rate p(-). The lifetimes, Xi,X 2 , ■ . ., are 
determined by the law G, and N(-), X\, A 2 , . . . are independent. The sample consists 
of all subjects who entered S during X and are in S sometime during (0, C), where T 
is a subset of (-co, C), usually an interval. Figure[T]presents several common exam- 
ples and the corresponding samples. It is assumed that p(x) = A x Ao(a;) where Ao is 
known and that j x Xo(u)G(—u)du — /J,* < oo. For each sampled subject, we observe 
the entrance time a and the possibly censored residual lifetime r with the failure 
indicator 8. Denote by (ai, x%), . . . , (a m , x m ) and (a m+ x, y m +i), ■ ■ ■ , (a m+n , y m +n) 
the data on the m uncensored and n censored observations, then the likelihood is 
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given by 



(3.2) £(g)=n ' n 

j—m-\-l 



dGjx Maj) "j±T G( yj )\ (a 3 ) y (A^*)"+ m 
,jL4_i M* ' (n + m)! ' 



The NPMLE of A/i* is n + m and since Ao is assumed known, the problem reduces 
to maximizing for G the function 

(3.3) £(G|n + rn) = [[— — 11 — ~ 

i=l ^ j=m+l ^ 

which has the form of 1|) . In this case W(x) = f, x c ^ nX Xo(u)du which is con- 
tinuous and increasing. 
Examples: 

A homogeneous poisson process - cross-sectional sampling - X = (— oo,0). Early 
studies of this design arc 

[3 and [|. It is depicted in Figure[T]in the top right panel. 



The sample consists of all subjects who are in S at time and the data contain 
their entrance time and follow-up until time C. Here W(x) = x and //* = MX is 
the mean lifetime. The algorithm (|2.4p for this special case is derived by [H} who 
studies a slightly different model. 

A homogeneous poisson process - window sampling - X = (—oo, C). Here the sam- 
ple consists of the cross-sectional population and those entering during the follow-up 
period (see the bottom left panel of Figure[T]). Thus, there are two samples: i) a size 
biased sample comprising of subjects who entered S before 0, and ii) an unbiased 
sample comprising of those who entered during the time window [0,(7]. Both sam- 
ples are censored at (7. This model is a mixture of the random censorship and the 
cross-sectional models discussed above, with random number of observations from 
each model. Here W(x) = x + C and [i* = /x + (7. The model is studied by [§|; Q 
provides an EM algorithm for it. 

An inhomogeneous poisson process - a window sampling - X = (— oo, (7). Kalbflei- 
sch and Lawless study entrances according to an inhomogeneous process in a 
somewhat different model, mainly focusing on the uncensored case. They consider 
many models in this framework and estimate both the rate function and the dis- 
tribution of lifetimes. Under this model, W(x) — Xo(t)dt is proportional to the 
cumulative rate. 

A truncated poisson process - X = (— 0, —a). Wang [2(| derives the NPMLE of G 
when data are collected only for subjects whose ages at sampling time are in [a, (3\ 
for some known < a < (3. This design is depicted in the bottom right panel of 
Figure [T] and it is natural when data started to be recorded only /3 years ago (in 
that case a = 0) or when there is a specific interest on subjects who entered during 
the period (— j3, —a) (e.g., when S is defined by some epidemic status and a specific 
treatment was used during (— j3, —a)). Wang shows that when entrances to <S are 
according to a homogeneous Poisson process the likelihood is given by 

dG( Xl ) yj G(y 3 ) 



J^G(u)du ^ ^G{u)du 

A simple calculation shows that the likelihood is of the form (|l.ip with /i* = 
E[min(Jf, (3) — a} + so the algorithm (12. 4|) can be applied with W(x) = [min(a;, (3) — 
a] + . We note that in this case W(x) = for x £ [0, a] so implementation of the 
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algorithm looks problematic. However, under this setting, G is not identifiable on 
[0, a] [2(| and one can only hope to estimate G given X > a where W > 0. This 
model is used in Section |4] to analyze the Channing House data [f|. 

3.3. A discrete entrance process 

Mandel and Rinott (unpublished) study a discrete version of the Poisson entrance 
model in which entrances to S occur at fixed time points <jk < ■ • ■ < oi < ci < 0. 
At <7fc, Nk new subjects joining S with lifetimes Xki, ■ ■ ■ , XkN k ■ The model assumes 
that Nk has a Poisson distribution with parameter A x Ao(fc), where Ao is known, 
X ki ~ G (k = 1,...,K; i = l,...,N k ), and {N u . . . , N K , X n , X u , . . . , X KNk } 
are independent. Under this model, the NPMLE is obtained by maximizing (jl.ip 
with 

W(x) = A o( fc )- 

{k:-a k <x] 

Here W is a step function and Lemma [A.3I should be used to determine the support 
of G. When a% = —k, Xq is constant and lifetimes are integer valued, the model is 
a discrete size-biased model. 



3-4- Truncation models 

The left truncation model [H, 21, [23[ assumes that two independent variables 
A ~ W and T ~ G can be observed only on the region A < T. In addition, 
there is a random censoring variable C which is independent of T and satisfies 
P{C > A) = 1. Data comprise of (a*, min(^, Cj), <y (i = 1, . . . , m + n) which are 
m + ri realizations of (A, min(T, C), A) restricted to the region A < T, where A = 1 
if T < C and A = otherwise. Let /i* = P(A < T) = EW(T). Changing the 
notations as in (13.21). the likelihood of the data is 



(3 4) g dW( ai )dG( Xl ) ^ dW(a 3 )G(y 3 ) 

i=l " j=m+l " 

(note the similarity to (|3.2p ). If W is known, then dW(it,) can be omitted from the 
likelihood and the problem of maximizing (|3.4|) is equivalent to maximizing (|l.ip . 
Equation (|3.4p can be reexpressed as 

^ .ii g&P) x ii ^ ii ^ ' 

i=l v 1 ; j=m+l v J ; i=l ^ j= m +l ^ 

where the first term is the likelihood of T\A = a, A < T and the second term is 
the likelihood of A\A < T. If W is completely unknown, maximizing (|3.4|) for G is 
equivalent to maximizing the left term in (|3.5[) and the maximizer is the product- 
limit estimator (PLE) defined by 

g(g) = E?=ii{x l = t} 

i e; t =i <t< Xi }+ y:^ +1 i{ aj <t< yj y 

see [2(J. In the next section, the PLE and the NPMLE are compared on real and 
simulated data. 
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4. Illustration 

Channing House data. The data set comprises of 97 males who were residents of 
the Channing House retirement community in Palo Alto, California @. It contains 
the age at entry to the community and the age at death or censoring accompanied 
by the event indicator. In terms of Section 13. A\ the age at entry is A and the age 
at death is T. The parameter of interest, G, is the distribution of age at death (in 
months) of male residents of the community. 

Wang [2(| estimates the distribution of age (in months) at entry to be uniform 
on (782,1073). Assuming W is the uniform distribution function on (782,1073), 
G(-)/<5(782) was estimated using algorithm fl^U and W(x) = [min(x, 1073)-782]+. 
Note that this is similar to the bias of the truncated Poisson process model described 
at the end of Section [3721 The NPMLE and the PLE are depicted in the right panel 
of Figure[5J The survival under the uniform assumption is estimated to be somewhat 
higher. 

For the analysis in the right panel of Figure [2] only 93 out of the 97 residents 
were used. Using all 97 individuals was problematic since the PLE approaches zero 
after the first two failures. This is shown in the left panel of Figure [2] Also shown is 
the survival estimated by (|2.4|) based on all 97 individuals and after estimating W 
to be U(751, 1073) (by the minimum and maximum age at entry of all 97 residents). 
The NPMLE is seen to be less sensitive to outliers. It shows that (|2.4|) can provide 
reasonable estimates when the PLE fails, a common phenomenon in small truncated 
data sets. 

Simulation. A simulation study was conducted to compare the performances of 
the NPMLE and the PLE under the left truncation model described in Section 
E!H We used the EXP(l) model for both G and W and generated 400 data sets of 
50 observations each. The first row of Figure [3] compares the performance of the 
estimators in terms of log MSE at the deciles of G, calculated by the average over 
the repeated replicates. Since the PLE is not well defined when the risk group is 
empty before the last observation [^H, such data sets were not used to calculate 
the MSE of the PLE. They were used to calculate the MSE of the NPMLE. The 
columns of Figure [3] show the effect of censoring. Lifetime were censored at A + C 
for a fixed C such that the probability of censoring is 10,25 and 50 percent from 




Fig 2. Comparison of the PLE (dashed line) and the NPMLE (solid line) of survival in the 
Channing House community. Left - all 97 individuals. Right - excluding the first two failures. 
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Fig 3. Comparison of log MSE of the NPMLE (solid line) and the PLE (dashed line) calculated 
at deciles. Data were generated using W =EXP(1) . Sample size of 50 (top) and 200 (bottom), 
censoring probability of 0.1,0.25 and 0.5 from left to right. 



left to right. The second row of Figure [3] shows the results of the same analysis 
applied to data sets of 200 observations. These were generated by combining the 
simulated data sets of 50 observations (total of 100 data sets). Figure 0] shows 
the sensitivity of the estimators to the assumption on W . The analysis was done 
assuming the exponential distribution as before while the data were generated using 
W=Gamma(2 ! l). 

The results show that using knowledge on the bias improves estimation by 10%- 
25% in terms of MSE. Increasing the probability of censoring results in a higher 
MSE of both estimators especially in the right tail. The relative performance does 
not change much by censoring, but some indication of better relative performance 
of the NPMLE in the right tail is seen in the simulation with 50% probability of 
censoring. More interesting are the results of the sensitivity study that show that 
the NPMLE is quite sensitive to the assumption on W. In general, the MSE of the 
PLE is smaller than that of the NPMLE. Moreover, the performance of the PLE 
improves when sample size increases from 50 to 200 while the performance of the 
NPMLE does not change. However, the performance of the NPMLE is better than 
that of the PLE in the left tail even when the model is incorrect. This phenomenon 
is seen in simulation with other distributions (not shown) and is probably attributed 
to the small risk group in the left tail that results in unstable estimation. 

In the simulated data sets, the PLE was not well defined in 2% to 20% of the 
samples depending on the setting. 
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Fig 4. Sensitivity analysis. Log MSE of the NPMLE (solid line) and the PLE (dashed line) at 
deciles. Data were generated using W = Gamma(2, 1) and the NPMLE was calculated assuming 
W =EXP(1). Sample size of 50 (top) and 200 (bottom), censoring probability of 0.1,0.25 and 0.5 
from left to right. 



5. Discussion 

The principal aim of this article is to provide a general framework and a unified 
algorithm for problems involving bias and censoring. A secondary aim is to contrast 
Vardi's multiplicative censoring model with truncated data and to compare the 
NPMLE to the PLE. An important question is which of the two estimators to use. 
The PLE cannot be calculated when all observations are censored or when data 
do not contain the truncation times 01, ... , a m+n (see Section 13.41) . and may not 
be well defined in data sets that do contain the truncation times as illustrated by 
the Channing House example. In all of these situations, the NPMLE exists and can 
be used. The simulation study shows that the NPMLE is more efficient when the 
model is correctly specified. However, it also indicates that it is quite sensitive to 
the assumed form of W. The use of the NPMLE, therefore, should be limited to 
situations where there is a theoretical justification for the assumed model of W. 
Furthermore, when data contain truncation times, the assumed form of W can and 
should be tested. Wang [2(| suggests a graphical goo dness-of-fit test by plotting 
an estimate of W versus the assumed model, and 1 101 ] study generalized Pearson 
statistics that can provide formal goodness-of-fit tests for the current model. These 
and other goodness-of-fit tests are studied in 

Algorithm (|2.4p can be nested in more complex algorithms to provide nonpara- 
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metric estimates for other interesting problems. Several examples are given in the 
unpublished PhD dissertation of the author. For example, Wang [l9j studies the 
semi-parametric left truncation model {W £ Wg,G unrestricted}, where We is a 
family of distributions indexed by 9. Her method, however, is applicable only for 
uncensored data and hence is of limited use. Estimates under Wang's model for 
censored data can be obtained by an iterative algorithm that uses (|2.4p in one of 
its steps. Preliminary simulation results reveal better performance of this estimator 
over the PLE. 

The algorithm presented in this article can be easily extended to likelihood of 
the form 



where fi* = W s (x)dG(x) for known increasing and right continuous functions 
W s (s = l,...,S). This likelihood generalized the model of [17| to the multiplica- 
tive censoring case. The complete likelihood in this problem involves products of 
likelihoods from different weight functions. The E-step is equivalent to (|2.9p . and 
the M-step uses Vardi's algorithm for selection bias models [171 ]. 




Appendix A: The support of the NPMLE 

This appendix discusses the determination of the support of the NPMLE for a 
non increasing right continuous function W. Although the NPMLE is not always 
unique (when W has steps) it is shown that there exists an NPMLE that assigns 
mass only to the observed points (Xemma lA.l[) . Furthermore, Lemmas IA. 21 and IA. 31 
characterizes observed points that can be excluded from the support. 
Let W~ 1 (y) = min{v : W(v) > y}, then for the likelihood (|2.2[) we have 

Lemma A.l. There exists an NPMLE of G° that assigns mass only to the critical 
points xx, ...,x m , W' 1 (y 1 ), W~ 1 (y n ). 

Proof. If mass is assigned to points other than the critical ones (i.e., points other 
than Xj or W^ 1 (yi)), then shifting the mass to the closest critical point to the 
left will not decrease the likelihood. The assertion follows after noticing that mass 
assigned to the left of the minimal critical point contributes nothing to the likelihood 
since the integrals have left limits. 

Next, suppose that m = n = 1, W is constant on an interval [a, b) and a < x\ < b 
and W^ 1 (yi) = a. Drawing a step function W and inspecting the likelihood show 
that the NPMLE assigns mass only to x\ . In general, if there exist i and j such that 
W(xi) = W(W~ 1 (yj)), then the likelihood (|2.2[) increases if mass first assigned to 
W~ 1 (yj) is shifted to Xi. This excludes several of the critical points from being in 
the support of the NPMLE of G° and it is summarized by 

Lemma A. 2. Let W _1 (y|x) = mini<j< m {xi : W(xi) — W(W~ 1 (y))} if such Xi 
exists, and W~ 1 (y\x.) = W~ 1 (y) otherwise. Then there exists an NPMLE of \2. j|) 
that assigns mass only to the points x\, ■ ■ ■ , x m , W (yi |x), . . . , W~ 1 (y n \'x.). 

The support of the NPMLE of G w from the likelihood (f2~Tj) is determined by 
arguments similar to those leading to Lemma I A. II and Lemma IA.2I This suggests 
that the support is a subset of {x\, . . . , x m , yi, . . . , y n } (from the integrals appearing 
in the likelihood (|2.1[) it is seen that the support points are the y's and not the 
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(5.1) 
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W~ 1 (y)'s). Moreover, if observations yj < Xi exist such that W(xi) = W(yj), then 
the likelihood increases if mass initially assigned to yj is shifted to Xi. Likewise, 
if there are yj < yj' such that W(yj) — W(yj>) then the likelihood increases if 
mass initially assigned to yj is shifted to yj> . The following lemma summarizes this 
discussion: 

Lemma A. 3. There always exists an NPMLE for 11. 1]) which assigns mass only 
to observed points. The complete observations Xi [i = 1, ... , m) are always points 
of support. A censored observation yj is not a point of support if: (i) there exists 
Xi > yj such that W(xi) — W(yj) or (ii) there exists yj> > yj such that W(yj>) = 
W( Vj ). 

Remark A.l. Lemma [A. 31 tells us which yj's are not points of support and not 
which yj's are. The EM algorithm may assign mass zero to some of the f/j's. One 
can always use the inefficient approach of considering all observations as support 
points and let the algorithm to assign zero mass to the redundant ones. 
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